Kalman filtering to reduce measurement noise of sample entropy: An electroencephalographic study

In the analysis of electroencephalography (EEG), entropy can be used to quantify the rate of generation of new information. Entropy has long been known to suffer from variance that arises from its calculation. From a sensor’s perspective, calculation of entropy from a period of EEG recording can be treated as physical measurement, which suffers from measurement noise. We showed the feasibility of using Kalman filtering to reduce the variance of entropy for simulated signals as well as real-world EEG recordings. In addition, we also manifested that Kalman filtering was less time-consuming than moving average, and had better performance than moving average and exponentially weighted moving average. In conclusion, we have treated entropy as a physical measure and successfully applied the conventional Kalman filtering with fixed hyperparameters. Kalman filtering is expected to be used to reduce measurement noise when continuous entropy estimation (for example anaesthesia monitoring) is essential with high accuracy and low time-consumption.

Most studies on the entropy measures of EEG focus on the values of entropy at some specific moments while there exist some circumstances that emphasize the evolving pattern.For example, Liang et al. use entropy measures to monitor depth of anesthesia [5].Kbah et al. use entropy-based biomarkers to monitor epileptic EEG activity [10].Dı ´az et al. show entropy dynamic map of EEG in resting conditions [26].
Calculation of entropy can be conceived as a physical sensor to measure the irregularity of the time series in question.Like any other measure of sensors [27], sample entropy also suffers from the inherent measurement noise, which can be estimated numerically [28].The measurement noise of entropy should be reduced to achieve more accurate estimation.Conventional smoothing methods, moving average [29], exponentially weighted moving average (EWMA) [30] can be applied to the continuously computed (measured) entropy values with a sliding window.The two methods may suffer from high computational cost or low performance, which may not be optimal when online monitoring of entropy measures is required.
As a powerful technology for estimating the states of a dynamic system, the Kalman filtering is usually applied to the recorded time series, e.g., EEG [31], electrocardiogram [32], positioning in global positioning system [33] and drone tracking [34].
In this paper, we propose a measurement noise-reducing method for entropy, in which, the Kalman filtering operates on the continuous calculated entropy values of EEG time series with non-overlapping sliding windows.We test this method on simulated signals (power noise, Logistic map signals and Ro ¨ssler system signals) as well as EEG recordings from publicly available datasets (sleep EEG, and EEG recordings from pediatric subjects with refractory seizures).We also compare the smoothing effects and computational costs among three smoothing methods.We also study the effects of hyperparameters of Kalman filtering on the variance reduction.

Datasets
Simulation signal generation.First, we generated power-law time series (also called power noise) with power spectrum of 1/f β following [35].The signals were generated with known β: 0, 0.5, 1, representing white noise, pink noise and 1/f noise respectively.Each time series contains 50,000 data points at a sampling rate of 100 Hz.The power noise signals were generated using the Matlab toolbox powernoise.m provided in [36].
Logistic map signals are often used [6] to compare entropy measures to the original work by Costa et al [37].Logistic map signal can be defined by We generated logistic map signals using parameters r = 3.57, 3.77 and 3.9.
Ro ¨ssler system signals can also be used as a test dataset to assess entropy related properties [35].A Ro ¨ssler system is expressed as [38] We generated Ro ¨ssler system time seires using parameters a = 0.38, b = 0.2, c = 5.7 following [35] and two additional parameters c = 2.5 and 4.
Real-world dataset.In this study, two EEG datasets were analyzed.The first EEG dataset is the Sleep-EDF Expanded Database open sleep dataset published on physioNet.One hundred and fifty three SC* files (SC, sleep cassette) were obtained in a 1987-1991 study on the age effects [39,40].Polysomnograms were recorded twice, for approximately 20 hours each time, at a sampling rate of 100 Hz.Polysomnograms contained Fpz-Cz-and Pz-Oz-based EEG signals, horizontal electrooculogram signals, sub-chin electromyography, and event markers.All polysomnograms were manually scored by trained technicians according to the 1968 Rechtschaffen and Kales manual.Polysomnograms included sleep stages 'W', 'R', '1', '2', '3', '4', 'M' and '?', representing Wake, REM, S1, S2, S3, S4, movement and unlabeled, respectively.In our study, in order to investigate how Kalman filtering can reduce the variance of entropy on slower changing sleep data, we chose to analyze the same channels in different people.The window length for entropy calculation is 5 times the sampling frequency (100 Hz), i.e., 500.
Our second dataset was derived from the CHB-MIT Scalp EEG Database, a collection of EEG recordings from 22 (5 males, 17 females) pediatric subjects with refractory seizures collected by Children's Hospital Boston [41,42].Detailed descriptions of these samples are on the physioNet website.The subjects were monitored for up to several days after discontinuation of antiepileptic drugs.Continuous EEG data were recorded for each subject.A total of 182 seizure onset and end times were recorded.All data were acquired at a sampling rate of 256 Hz with 16-bit resolution.These recordings were made using the International 10-20 EEG Electrode Location and Naming System.In this study, in order to investigate the performance of Kalman filtering in reducing the variance of entropy for transient EEG activities, we chose to analyze EEG segments of one-hour that contain seizures from channels FP1-F3, F3-C3, C3-P3 and P3-O1 in the same subject (chb03).The window length for entropy calculation is 5 times the sampling frequency (256 Hz), i.e., 1280.

Sample entropy
Sample entropy is invented by Richman et al [43] and is briefly summarized here to show the principles and parameter choice.Let the raw data sampled at equal event intervals be u(i), i = 1, 2, � � �, N. First, Reconstructing m-dimensional vectors x(1), x(2), � � � x(N − M + 1), where Define the distance d[x(i), x(j)] between x(i) and x(j) to be the one with the largest difference between the corresponding elements of the two, i.e., where u(a) is an element in the vector x.For each value of i calculate the distance d[x(i), x(j)] between x(i) and the other vectors x(j), j = 1, 2, � � �, N − m + 1.
According to the given threshold r(r > 0), for 1 ⩽ i ⩽ N − m + 1, the number of d[x(i), x(j)] < r is counted for each value of i and the ratio to the total number of vectors N − m is denoted as B m i ðmÞ.
The average of the B m i ðmÞ over all values of i, denoted B m (r), i.e., Then increase the dimension m to m + 1 to get B mþ1 i ðrÞ.Thus the sample entropy is defined as Since N cannot be 1, where N is the length of the data; m is the embedding dimension, r is the threshold (is calculated as r = c � σ, where, σ is the standard deviation of the original sequence).
The conditional probability (CP) is defined as There is no a priori parameter setting precedure currently.According to the evaluation by Lake et al. [28] for the neonatal heart rate data, m should be 2 or 3; c is between 0.1 and 0.25.The above choice of parameters is widely accepted [35,44,45].In this study, we adopted typical values of embedding dimension m and threshold r, i.e., m = 2, r = 0.15σ.We also compared the effects of different choices of m (m = 2 or 3) and r (in the range of 0.1σ and 0.25σ).
Theoretical estimation of sample entropy variance.The theoretical variance of sample entropy can be numerically calculated [28] as where K A is the number of pairs of matching templates of length m + 1 that overlap and K B is the number of pairs of matching templates of length m that overlap.

Other entropy measures
To test the feasibility of the proposed Kalman filtering, we also tried two additional entropy measures.Approximate entropy.Approximate entropy is introduced by Pincus et al [16] and is briefly summarized here to show the principles and parameter choice.Let the raw data sampled at equal event intervals be u(i Define the distance d[x(i), x(j)] between x(i) and x(j) to be the one with the largest difference between the corresponding elements of the two, i.e., where u(a) is an element in the vector x.For each value of i calculate the distance d[x(i), x(j)] between x(i) and the other vectors x(j), j = 1, 2, � � �, N − m + 1.
According to the given threshold r(r > 0), for 1 ⩽ i ⩽ N − m + 1, the number of d[x(i), x(j)] < r is counted for each value of i and the ratio to the total number of vectors N − m + 1 is denoted as B m i ðmÞ.
Take the logarithm of B m i ðmÞ first, and then find its average over all i.denoted as, Then increase the dimension m to m + 1 to get F m+1 (r).Thus the approximate entropy is defined as where N is the length of the data; m is the embedding dimension, r is the threshold.
There is no a priori parameter setting procedure currently.According to the evaluation by Bajić et al [46], m should be 2 or 3; r is between 0.1σ and 0.25σ (σ is the standard deviation of the time series u(i)).In this study, we adopted typical values of embedding dimension m = 2 and threshold r = 0.15σ.
Neural network entropy.Recently, a new entropy measure has been proposed, namely, neural network entropy (NNetEn), which is a neural network-based technique for entropy estimation of time series data [17].Unlike conventional entropy measures, NNetEn does not consider a probability distribution, and depends on only one parameter, number of epochs in the LogNNet model [47].We used Method 1 (Row-wise filling) with duplication and set epoch to 20 to calculate the NNetEn entropy.

Kalman filtering
Kalman filtering is an algorithm that utilizes the state equation of a linear system to optimally estimate and predict the state of the system by means of the input and output observations of the system [48].Since the observation data include the influence of noise and interference in the system, the optimal estimation can also be regarded as a filtering (smoothing) process.The principle is to use the Kalman gain to correct the state prediction value to make it close to the real value.It consists of two main steps: prediction and update.The algorithm is divided into the following steps: A system is represented by a discrete state-space equation as where X k is the state variable, the "idea" entropy which is free from variance interference; Z k is the measurement variable, which is normally calculated from an algorithm (e.g., sample entropy, approximate entropy, NNetEn entropy).W k−1 and V k are the noises in the system and in the measurement process, which obey normal distributions with zero mean and covariance matrices Q, R, respectively, i.e., W k ∽ P(0, Q), V k ∽ P(0, R).U k is the control variable that relates the optional control input to the state variable X k .A is the state transfer matrix that relates the state at the previous time step to the state at the current step.B is the control matrix that relates the optional control input to the state X k .H is the state observation matrix, which relates the state to the measurements.The Kalman algorithm is divided into two steps, prediction and update.Prediction: estimate the state at the current moment, namely moment k, based on the a posteriori estimate b X kÀ 1 of the previous moment k − 1, and get the a priori estimate b Based on the covariance matrix P k−1 of the error e k−1 at moment k − 1 and the covariance matrix Q of the process noise w, the covariance matrix P À K of the error e k at moment k of the prediction is taken, which is.
Update: Correct the prediction stage estimates using the current moment measurements to get the current moment a posteriori estimates.
Calculate the Kalman gain coefficient V k at moment k.that is Corrective updating of the state using the Kalman gain yields the estimate b The iteration that estimates the optimal value at the k + 1 moment performs the update operation.That is where I is the identity matrix.
In this study, since we are dealing with scalar variables, the matrix A, B, H, P, etc can then be expressed as scalars, A, B, H, P, etc.Similarly, covariance matrices Q, R become variance of system Q, and variance of measurement R. A and H are set to 1; the control variable U k is set to 0. Therefore, the linear dynamic system can be described by state equation And the corresponding Kalman filtering process can be described as We set X 0 to Z 0 , Q to 0.1, and R to 0.5.

Benchmark smoothing methods
Moving average.Moving average is a classical filtering algorithm, whose main idea is to process the signal with a sliding window, in which the data are averaged.Moving average can reduce the variance of the signal by eliminating the periodic noise [49].
In this study, we apply the moving average algorithm to sample entropy values.The specific steps are as follows: • Define the size of each window, i.e. the number of entropy points contained; the window size is defined as 5.
• Move the window to the next position from the starting point of the entropy values.
• Calculate the mean value of the entropy values within each window.
• Update entropy values.Use the calculated mean value to represent the entropy values within the window.
A series of smoothed entropy values can be obtained by constantly moving the window and calculating the mean values.These smoothed entropy values reduce the measurement noise of entropy.
Exponentially weighted moving average.Exponentially weighted moving average (EWMA) is a commonly used time series data smoothing technique [30].The core idea of EWMA is that it gives a higher weight to the nearest data point, while its weight decreases exponentially as the data point gets further away from the current time point.The calculation formula is as follows [30] • Initialize the first EWMA value as the first data point.
• For each time point t, the EWMA value is calculated according to the following formula: where EMA(t) represents the EWMA value at time point t, EMA(t − 1) represents the EWMA value at time point t − 1, x(t) represents the observed value at time point t, and α is the smoothing factor (usually a decimal number between (0,1)).The initial value of α is set to 1.
• The steps above are repeated until the EWMA values for all time points have been calculated.

Variance reduction rate
We define the variance reduction rate (VRR) as where V before stands for the variance of the entire time series of sample entropy measure calculated over the EEG waveforms; and V after stands for the variance of the entire time series of sample entropy measures that have been smoothed by Kalman filer.

Statistical analysis
This distribution of entropy time series was tested using Shapiro-Wilk Normality test.Oneway ANOVA was performed to examine if the significant differences existed between the four groups of thresholds r = 0.1σ, r = 0.15σ, r = 0.2σ, r = 0.25σ, as well as to examine if there are significant differences in VRR values for parameters Q and P. If there is a statistically significant difference between the means of groups, Bonferonni was used for post hoc multilpe comparisons.

Simulations
In the simulations, we applied the Kalman filtering to reduce measuring variance of entropy on power noise.We generated power noise on β = 0 (

EEG of sleep-EDF expanded database
We analyzed the signals from selected EEG data from sleep-EDF Database Expanded.We also calculated the sample entropy time series (marked as Original), and then applied Kalman filtering to produce the smoothed sample entropy time series (marked as Kalman filtering).Fig 4 visiually shows that Kalman filtering reduces the variance for all four subjects.VRR values by Kalman filtering were 33.16%, 51.06%, 52.73% and 39.14% for the four subjects (Table 1).
In parallel, we also compared the original sample entropy values with the ones smoothed by Kalman filtering, moving average and EWMA for the four subjects respectively (Fig 5).Table 1 shows that VRR vaules are highest for Kalman filtering comparing to moving average and EWMA for most patients.At the same time, we also compared entropy time series smoothed by Kalman filtering with the original sample entropy, as well as sample entropy smoothed by moving average and   Table 2 shows highest VRR values by KF and lowest by MA.
In terms of time consumption, different brain regions show that Kalman filtering takes significantly less time than moving average and EWMA while moving average and EWMA are similar (Fig 8e -8h).

Discussion
Entropy is a measure of the amount of uncertainty associated with a variable [20].Most studies on the entropy measurement of EEG focus on the values of entropy at some specific moments while in some circumstances, continuous monitoring of entropy is crucial, e.g., in monitoring depth of anesthesia [5], and epileptic EEG activity [10].However, entropy as a measure suffers from the inherent noise, which is similar to a sensor's measurement noise [27].We have validated the feasibility that Kalman filtering can be used to smooth entropy time series obtained with a sliding window on simulated time series (power noise, Logistic map signals and Ro ¨slar To investigate the power of different smoothing methods, we have compared Kalman filtering to moving average, EWMA for sleep signals (Figs 5 and 6) and epilepsy data (Fig 8).The results show that VRR vaules are highest for filtering comparing to moving average and EWMA (Tables 1 and 2).Moving average and EWMA are commonly used with time series data to smooth out short-term fluctuations and highlight longer-term trends or cycles.There are other smoothing techniques, e.g., median average filtering or noise smoothing method based on wavelet thresholding techniques [50].In addition to sample entropy, Kalman filtering can also be to other entropy measures.This is verified by two entropy measures as examples, one is a long-lasting entropy, i.e., approximate entropy [16], and the other is a newly invented one, i.e., NNetEn [47].We have shown the of Kalman filtering to these two entropy measures (Figs 9 and 10).VRR values for NNetEn entropy are higher than those for approximate entropy and sample entropy, indicating that NNetEn has lower measurement variance than approximate entropy for both sleep and epilepsy signals.We acknowledge that there exist a lot entropy measures (e.g., dispersion entropy [18], permutation entropy [51,52], fuzzy entropy [53]), and expect a future exploration of applicability to other entropy measures.
have set the transition matrix A and measurement matrix H to identity matrix.This is in accordance with the literature where, for the physiological time-series, the transition matrix is usually replaced by an identity matrix [54,55].
For a typical selection of m = 2, there is no significant difference in the VRR of Kalman filtering in the range of r = 0.15 * 0.25 (Fig 11).This means it is up to the users to choose r and we would suggest r = 0.15 to follow conventions [35,45].
The two hyperparameters Q and R are covariances that can affect the performance of the Kalman filter (Fig 12).The process noise covariance Q reflexes the change of entropy measure.When Q is zero, there is no change at all for the entropy measure; on the other extreme, when Q is big enough, the entropy is allowed to change freely, which will lead to abrupt spikes.In this study, we have empirically set values of and R. We notice a series of work using adaptive methods in the determination of the two values [56].In future work, we will explore adaptive methods to optimize these two hyperparameters.
Due to the limit of paper length, the signals used in this study are limited to power noise, Logistic map signals and Ro ¨sler signals.We note that there are some other synthetic signals that can be used for the evaluation of an algorithm, e.g., corrupted deterministic signal (MIX process) [6].
our best knowledge, this is the first study that uses the Kalman filtering to track the change of entropy-based measures.The reason might be that the entropy-based measures are nonlinear methods that can quantify the rate of generation of new information [57], while at the same time, the Kalman filtering, in its initial form, is used for linear process.The rationale behind our proposal is that the entropy is a physical measure (despite of its complex nature) which may evolve with linear rules.

Conclusion
Estimating entropy has been known to suffer from variance that arises from its calculation, producing measurement noise.We have for the first time addressed this issue and have used Kalman filtering technique to reduce the measurement noise.Kalman filtering is expected to be used to reduce measurement noise when continuous entropy estimation (for example anaesthesia monitoring) is essential with high accuracy and low time-consumption.
Fig 1a), β = 0.5 (Fig 1b), β = 1 (Fig 1c) for three examples.In order to quantify the differences before and after Kalman filtering, we first calculated three entropy time series.Then we calculated the theoretical variance in the three examples (Fig 1d-1f).Afterwards we applied Kalman filtering to reduce the measurement noise.Compared with the original entropy time series, VRR values of smoothed entropy time series by Kalman filtering were 73.97%, 79.87%, and 79.29% correspondingly.

Fig 1 .
Fig 1. Kalman filtering on sample entropy time series of different power noise.The first row (a-c): the generated waveforms of power noise with different parameters, β = 0 (white noise), 0.5 (pink noise), and 1 (1/f noise).Segments of 2 seconds were shown for clarity.The second row (d-f): theoretical variances of sample entropy of the corresponding waveforms (a-c), which were calculated from Eq 10.The third row (g-i): the original and smoothed sample entropy time series of the corresponding waveforms (a-c) by Kalman filtering.https://doi.org/10.1371/journal.pone.0305872.g001 Fig 4   shows four subjects with subject numbers: SC4001E0, SC4011E0, SC4021E0 and SC4032E0.A

Fig 3 .Fig 2 .
Fig 3. Kalman filtering on sample entropy time series of different Ro ¨ssler system signals.The first row (a-c): the generated waveforms of logistic map with different parameters, c = 2.5, 4, 5.7.The second row (d-f): the theoretical variance of sample entropy time series of the corresponding waveforms (a-c) calculated from Eq 10.The third row (gi): the original and smoothed sample entropy time seires of the corresponding waveforms (a-c) by Kalman filtering.https://doi.org/10.1371/journal.pone.0305872.g003 Fig 6 shows that moving average takes more time than Kalman filtering or EWMA; while Kalman filtering and EWMA are close.EEG of CHB-MIT EEG segments from four channels of a one-hour recording that contain seizures are illustrated in Fig 7, which are: Fp1-F3 (Fig 7a), F3-C3 (Fig 7b), C3-P3 (Fig 7c), and P3-O1 (Fig 7d).The seizure activities are highlighted in the insets (first row of Fig 7).The theoretical variance values of the sample entropy for the four channels were calculated according to Eq 10 (second

Fig 4 .
Fig 4. Kalman filtering on sample entropy time series of sleep signals.(a-d) demonstrate the smoothing effect on S1, S2, S3 and W stages for subjects SC4001E0, SC4011E0, SC4021E0, and SC4032E0.First column: EEG waveforms; second column: theoretical variance computed by Eq 10; third row: original sample entropy time series and the sample entropy time series after smoothing by Kalman filtering.S1: sleep stage 1; S2: sleep stage 2; S3: sleep stage 3; W: wake stage.From the third column, it is visually clear that Kalman filtering reduces the variance of sample entropy measures; detailed values, see Table 1.
Fig 4. Kalman filtering on sample entropy time series of sleep signals.(a-d) demonstrate the smoothing effect on S1, S2, S3 and W stages for subjects SC4001E0, SC4011E0, SC4021E0, and SC4032E0.First column: EEG waveforms; second column: theoretical variance computed by Eq 10; third row: original sample entropy time series and the sample entropy time series after smoothing by Kalman filtering.S1: sleep stage 1; S2: sleep stage 2; S3: sleep stage 3; W: wake stage.From the third column, it is visually clear that Kalman filtering reduces the variance of sample entropy measures; detailed values, see Table 1.
https://doi.org/10.1371/journal.pone.0305872.t001EWMA.As shown in Fig 8a-8d, moving average and EWMA moderately reduce variance of the original entropy values, while Kalman filtering suppresses the variance to a larger extent.
Kalman filtering is applied to approximate entropy and NNetEn entropy for sleep signals(Fig 9)  and epilepsy signals (Fig 10).VRR values for NNetEn entropy are higher than those for approximate entropy and sample entropy, indicating that NNetEn has lower measurement variance for both sleep and epilepsy signals.

Fig 11
Fig 11 shows the effects of threshold r and embedding dimension m on VRR smoothed by Kalman filtering on sample entropy time series.For m = 2, the VRR values for r = 0.15, r = 0.15,

Table 1 . Comparison of VRR by different methods for sleep data.
SampEn stands for sample entropy; MA stands for moving average; KF stands for Kalman filtering.

Table 2 . Comparison of VRR by different methods in for epilepsy data.
Note: SampEn stands for sample entropy; MA stands for moving average; KF stands for Kalman filtering.https://doi.org/10.1371/journal.pone.0305872.t002